Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Fast high average-utility itemset mining algorithm based on utility-list structure
WANG Jinghua, LUO Xiangzhou, WU Qian
Journal of Computer Applications    2016, 36 (11): 3062-3066.   DOI: 10.11772/j.issn.1001-9081.2016.11.3062
Abstract535)      PDF (722KB)(465)       Save
In the field of data mining, high utility itemset mining has been widely studied. However, high utility itemset mining does not consider the effect of the itemset length. To address this issue, high average-utility itemset mining has been proposed. At present, the proposed high average utility itemset mining algorithms take a lot of time to dig out the high average-utility itemset. To solve this problem, an improved high average itemset mining algorithm, named FHAUI (Fast High Average Utility Itemset), was proposed. FHAUI stored the utility information in the utility-list and mined all the high average-utility itemsets from the utility-list structure. At the same time, FHAUI adopted a two-dimensional matrix to effectively reduce the number of join-operations. Finally, the experimental results on several classical datasets show that FHAUI has greatly reduced the number of join-operations, and reduced its cost in time consumption.
Reference | Related Articles | Metrics
Project keyword lexicon and keyword semantic network based on word co-occurrence matrix
WANG Qing, CHEN Zeya, GUO Jing, CHEN Xi, WANG Jinghua
Journal of Computer Applications    2015, 35 (6): 1649-1653.   DOI: 10.11772/j.issn.1001-9081.2015.06.1649
Abstract1186)      PDF (877KB)(567)       Save

In order to solve the problems of keyword extraction and project keyword lexicon establishment of technological projects in professional fields, an algorithm for building the lexicon based on semantic relation and co-occurrence matrix was proposed. On the basis of conventional keyword extraction research based on co-occurrence matrix, the algorithm considered several advanced factors such as the location, property and Inverse Document Frequency (IDF) index of the keywords to improve the traditional approach. Meanwhile, a method was given for the establishment of keyword semantic network using co-occurrence matrix and hot keyword identification through computing the similarity with semantic base vector. At last, 882 project experiment documents in power field were used to perform the simulation. And the experimental results show that the proposed algorithm can effectively extract the keywords for the technological projects, establish the keyword correlation network, and has better performance in precision, recall rate and F1-score than the keyword extraction algorithm of Chinese text based on multi-feature fusion.

Reference | Related Articles | Metrics